I am working on two types of signature-level modeling. The first is subsampled models, in which we subsample to every \(100^{th}\) \(x\) location. The second is attempting to use an entire signature and model out the structure of the signature itself.
I haven’t been able to come up with a clean, reproducible way to do these models yet. However, I have some examples from each barrel of how things vary across the 5-phased models.
| random_effect | phase_1 | phase_2 | phase_3 | phase_4 | phase_5 |
|---|---|---|---|---|---|
| operator | 0.0425 | 0.0277 | 0.0009 | 0.0235 | 0.0000 |
| bullet | 0.0429 | 0.0975 | 0.0980 | 0.0917 | 0.0453 |
| machine | 0.0000 | 0.0017 | 0.0003 | 0.0072 | 0.0000 |
| residual | 0.7059 | 0.6837 | 0.6686 | 0.7245 | 0.7760 |
| random_effect | phase_1 | phase_2 | phase_3 | phase_4 | phase_5 |
|---|---|---|---|---|---|
| operator | 0.0000 | 0.0000 | 0.0000 | 0.000 | 0.000 |
| bullet | 0.0154 | 0.0931 | 0.0738 | 0.000 | 0.000 |
| machine | 0.0000 | 0.0291 | 0.0000 | 0.000 | 0.000 |
| residual | 1.1042 | 1.1494 | 1.1861 | 1.222 | 1.098 |
| random_effect | phase_1 | phase_2 | phase_3 | phase_4 | phase_5 |
|---|---|---|---|---|---|
| operator | 0.0000 | 0.0000 | 0.0000 | 0.0187 | 0.0019 |
| bullet | 0.0357 | 0.0296 | 0.0287 | 0.0199 | 0.0284 |
| machine | 0.0000 | 0.0000 | 0.0144 | 0.0169 | 0.0000 |
| residual | 0.5979 | 0.5716 | 0.5835 | 0.6046 | 0.6051 |
One approach to modeling signatures is to iteratively remove elements of the signature which contribute to underlying structure, or signal. R&R studies aim to measure the variability - we can think of this as the noise - surrounding a particular structure. The structure, which we can think of as the signal in this case, is of less interest than the variability patterns surrounding it.
However, lots of elements can contribute to the structure. In our case, some of the study design elements contribute.
Pairwise comparisons pipeline in bulletxtrctr uses expand.grid to create a grid of all pairwise comparisons for a set of LEA scans. This results in each pair of LEAs being compared twice. For example:
leas <- c("Land A", "Land B", "Land C")
expand.grid(land1 = leas, land2 = leas, stringsAsFactors = F)
## land1 land2
## 1 Land A Land A
## 2 Land B Land A
## 3 Land C Land A
## 4 Land A Land B
## 5 Land B Land B
## 6 Land C Land B
## 7 Land A Land C
## 8 Land B Land C
## 9 Land C Land C
So we have “Land B - Land A” compared in row 2, and “Land A - Land B” compared in row 4. These are the same pairwise comparison.
In reality, we only need “Land A - Land B” to be compared ONCE.
For modeling the variability of pairwise comparison scores, allowing each pair to be compared twice could artifically reduce the variability present in scores by duplicating scores and increasing sample sizes.
Function make_comparison_grid which takes in a list of LEA scan IDs and returns a data frame with columns land1, land2, and pairing_id. Instead of using expand.grid, it uses the combn function to find all combinations of size 2 within the vector of scan IDs.
make_comparison_grid <- function(scan_ids, self_comparisons = T){
combos <- combn(scan_ids, 2) %>% t()
combos_df <- data.frame(land1 = combos[,1], land2 = combos[,2])
if(self_comparisons == T){
self_combos <- data.frame(land1 = scan_ids, land2 = scan_ids)
combos_df <- rbind(combos_df, self_combos) %>%
mutate(pairing_id = paste0(land1, "_", land2))
}
else(combos_df <- combos_df %>%
mutate(pairing_id = paste0(land1, "_", land2)))
return(combos_df)
}
make_comparison_grid(scan_ids = leas)
## land1 land2 pairing_id
## 1 Land A Land B Land A_Land B
## 2 Land A Land C Land A_Land C
## 3 Land B Land C Land B_Land C
## 4 Land A Land A Land A_Land A
## 5 Land B Land B Land B_Land B
## 6 Land C Land C Land C_Land C
You can also choose whether to include self_comparisons, which would be comparing “Land A - Land A”, etc. This is the equivalent to including the diagonal in a grid of comparisons. The default is to include these comparisons.
make_comparison_grid(scan_ids = leas, self_comparisons = F)
## land1 land2 pairing_id
## 1 Land A Land B Land A_Land B
## 2 Land A Land C Land A_Land C
## 3 Land B Land C Land B_Land C
This is a relatively simple solution, but will significantly reduce computation time when completing pairwise comparisons because it reduces the required number of comparisons from \(n^2\) to \(\frac{n(n+1)}{2}\) - nearly in half. An example of how this would update the comparison of two bullets is seen below:
You still get all the important comparisons, but you have about half the computing time required to actually get to all those RFscores!
Another thing to note is that even if you only calculate this reduced number of comparisons, you can still generate the visuals for the “full grid” by re-generating the alternative pairing ID’s. You can generate an alternate version, using NA values when it is a self-comparison (i.e., a scan to itself). Then once the alternate name is generated, you can gather the two names, which fills in the RFscore values for both versions of the name. Thus, you can plot the whole grid! The function is below.
expand_grid_visual <- function(pairing_id, rfscore){
comp_grid <- data.frame(pairing_id = pairing_id, rfscore = rfscore)
comp_grid <- comp_grid %>% mutate(pairing_id_alt = purrr::map_chr(as.character(pairing_id), .f = function(pairing_id){
land1 = strsplit(pairing_id, "_")[[1]][1]
land2 = strsplit(pairing_id, "_")[[1]][2]
alt_id = ifelse(land1 == land2, NA, paste0(land2, "_", land1))
return(alt_id)
}))
comp_grid_long <-
comp_grid %>%
mutate(pairing_id = as.character(pairing_id)) %>%
gather(c(pairing_id, pairing_id_alt),
key = "id_method", value = "id_value") %>%
mutate(id_dummy = id_value) %>%
separate(col = id_dummy, into = c("land1", "land2"), sep = "_") %>%
filter(!is.na(id_value))
return(comp_grid_long)
}
A plot of the reduced and then back-expanded RFscores is below:
## Linear mixed model fit by REML ['lmerMod']
## Formula: rfscore ~ unique_id_l1 + (1 | bullet_gt/unique_id_l1) + (1 |
## operator_gt) + (1 | machine_gt) + (1 | operator_gt:bullet_gt) +
## (1 | operator_gt:machine_gt) + (1 | bullet_gt:machine_gt)
## Data: orange_pairs
## REML criterion at convergence: -38198.81
## Random effects:
## Groups Name Std.Dev.
## unique_id_l1:bullet_gt (Intercept) 0.1310007
## bullet_gt:machine_gt (Intercept) 0.0009005
## operator_gt:machine_gt (Intercept) 0.0017356
## operator_gt:bullet_gt (Intercept) 0.0137432
## machine_gt (Intercept) 0.0018698
## operator_gt (Intercept) 0.0049594
## bullet_gt (Intercept) 0.0976507
## Residual 0.1631489
## Number of obs: 48606, groups:
## unique_id_l1:bullet_gt, 12; bullet_gt:machine_gt, 4; operator_gt:machine_gt, 4; operator_gt:bullet_gt, 4; machine_gt, 2; operator_gt, 2; bullet_gt, 2
## Fixed Effects:
## (Intercept) unique_id_l1BL O-2 unique_id_l1BL O-3
## 0.87994 0.11334 0.08562
## unique_id_l1BL O-4 unique_id_l1BL O-5 unique_id_l1BL O-6
## -0.01999 0.10679 -0.39001
## convergence code 0; 1 optimizer warnings; 0 lme4 warnings
## Linear mixed model fit by REML ['lmerMod']
## Formula: rfscore ~ unique_id_l1 + (1 | bullet_gt/unique_id_l1) + (1 |
## operator_gt) + (1 | machine_gt) + (1 | operator_gt:bullet_gt) +
## (1 | operator_gt:machine_gt) + (1 | bullet_gt:machine_gt)
## Data: orange_pairs_reduced
## REML criterion at convergence: -19423.73
## Random effects:
## Groups Name Std.Dev.
## unique_id_l1:bullet_gt (Intercept) 0.134045
## bullet_gt:machine_gt (Intercept) 0.001665
## operator_gt:machine_gt (Intercept) 0.003021
## operator_gt:bullet_gt (Intercept) 0.016711
## machine_gt (Intercept) 0.002516
## operator_gt (Intercept) 0.001960
## bullet_gt (Intercept) 0.096396
## Residual 0.162611
## Number of obs: 24573, groups:
## unique_id_l1:bullet_gt, 12; bullet_gt:machine_gt, 4; operator_gt:machine_gt, 4; operator_gt:bullet_gt, 4; machine_gt, 2; operator_gt, 2; bullet_gt, 2
## Fixed Effects:
## (Intercept) unique_id_l1BL O-2 unique_id_l1BL O-3
## 0.88139 0.11221 0.08442
## unique_id_l1BL O-4 unique_id_l1BL O-5 unique_id_l1BL O-6
## -0.02105 0.10573 -0.38681
## convergence code 0; 1 optimizer warnings; 0 lme4 warnings